Mining Historical Xml
نویسندگان
چکیده
Nowadays the Web poses itself as the largest data repository ever available in the history of humankind (Reis et al., 2004). However, the availability of huge amount of Web data does not imply that users can get whatever they want more easily. On the contrary, the massive amount of data on the Web has overwhelmed their abilities to find the desired information. It has been claimed that 99% of the data reachable on the Web is useless to 99% of the users (Han & Kamber, 2000, pp. 436). That is, an individual may be interested in only a tiny fragment of the Web data. However, the huge and diverse properties of Web data do imply that Web data provides a rich and unprecedented data mining source. Web mining was introduced to discover hidden knowledge from Web data and services automatically (Etzioni, 1996). According to the type of Web data, Web mining can be classified into three categories: Web content mining, Web structure mining, and Web usage mining (Madria et al., 1999). Web content mining is to extract patterns from online information such as HTML files, e-mails, or images (Dumais & Chen, 2000; Ester et al., 2002). Web structure mining is to analysis the link structures of Web data, which can be inter-links among different Web documents (Kleinberg 1998) or intralinks within individual Web document (Arasu & Hector, 2003; Lerman et al., 2004). Web usage mining is defined as to discover interesting usage patterns from the secondary data derived from the interaction of users while surfing the Web (Srivastava et al., 2000; Cooley, 2003). Recently, XML is widely used as a standard for data exchanging in the Internet. Existing work on XML data mining includes frequent substructure mining (Inokuchi et al., 2000; Kuramochi & Karypis, 2001; Zaki, 2002, Yan & Han, 2003; Huan et al., 2003), classification (Zaki & Aggarwal, 2003; Huan et al., 2004), and association rule mining (Braga et al., 2002). As data in different domains can be represented as XML documents, XML data mining can be useful in many applications such as bioinformatics, chemistry, network analysis (Deshpande et al., 2003; Huan et al., 2004) and etcetera. BACKGROUND
منابع مشابه
Mining Frequently Changing Substructures from Historical Unordered XML Documents
Recently, there is an increasing research efforts in XML data mining. These efforts largely assumed that XML documents are static. However, in many real applications, XML data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical XML documents. Specifically, we propose a novel approach to discover frequently changing structures (FCS) from a sequence of...
متن کاملXML structural delta mining: Issues and challenges
Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (als...
متن کاملMining Association Rules from Structural Deltas of Historical XML Documents
Previous work on XML association rule mining focuses on mining from the data existing in XML documents at a certain time point. However, due to the dynamic nature of online information, an XML document typically evolves over time. Knowledge obtained from mining the evolvement of an XML document would be useful in a wide range of applications, such as XML indexing, XML clustering. In this paper,...
متن کاملFRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents
In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pat...
متن کاملFinding NECTARs from Evolutionary Trees
Mining trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. These efforts largely assumed that the trees are static. However, in many real applications, tree data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical tree-structured data. Specifically, we propose a novel approach to discover negativel...
متن کاملFRECLE Mining: Discovering Frequent Semantic Tree Cluster Sequences from Historical Tree Structured Data
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. Existing techniques focus on finding “structural” patterns and ignores the “semantics” that may be associated with the subtrees. In this paper we proposal an algorithm to mine a novel pattern called frequent semantic tree cluster sequences (FRECLE), which captures the frequent...
متن کامل